Search CORE

2 research outputs found

Deep Motion Features for Visual Tracking

Author: Danelljan Martin
Felsberg Michael
Gladh Susanna
Khan Fahad Shahbaz
Publication venue
Publication date: 01/01/2016
Field of study

Robust visual tracking is a challenging computer vision problem, with many real-world applications. Most existing approaches employ hand-crafted appearance features, such as HOG or Color Names. Recently, deep RGB features extracted from convolutional neural networks have been successfully applied for tracking. Despite their success, these features only capture appearance information. On the other hand, motion cues provide discriminative and complementary information that can improve tracking performance. Contrary to visual tracking, deep motion features have been successfully applied for action recognition and video classification tasks. Typically, the motion features are learned by training a CNN on optical flow images extracted from large amounts of labeled videos. This paper presents an investigation of the impact of deep motion features in a tracking-by-detection framework. We further show that hand-crafted, deep RGB, and deep motion features contain complementary information. To the best of our knowledge, we are the first to propose fusing appearance information with deep motion features for visual tracking. Comprehensive experiments clearly suggest that our fusion approach with deep motion features outperforms standard methods relying on appearance information alone.Comment: ICPR 2016. Best paper award in the "Computer Vision and Robot Vision" trac

arXiv.org e-Print Archive

Publikationer från Linköpings universitet

Crossref

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Visuell följning med hjälp av djup inlärning och optiskt flöde

Author: Gladh Susanna
Publication venue: Linköpings universitet, Datorseende
Publication date: 01/01/2016
Field of study

Generic visual tracking is a challenging computer vision problem, where the position of a speciﬁed target is estimated through a sequence of frames. The only given information is the initial location of the target. Therefore, the tracker has to adapt and learn any kind of object, which it describes through visual features used to differentiate target from background. Standard appearance features only capture momentary visual information. This master’s thesis investigates the use of deep features extracted through optical ﬂow images processed in a deep convolutional network. The optical ﬂow is calculated using two consecutive images, and thereby captures the dynamic nature of the scene. Results show that this information is complementary to the standard appearance features, and improves performance of the tracker. Deep features are typically very high dimensional. Employing dimensionality reduction can increase both the efﬁciency and performance of the tracker. As a second aim in this thesis, PCA and PLS were evaluated and compared. The evaluations show that the two methods are almost equal in performance, with PLS actually receiving slightly better score than the popular PCA. The ﬁnal proposed tracker was evaluated on three challenging datasets, and was shown to outperform other state-of-the-art trackers

Publikationer från Linköpings universitet

Digitala Vetenskapliga Arkivet - Academic Archive On-line